Tools for Efficient Workflows
Part 2: Version control

Lukas Lehner and Maximilian Trenkmann

2023-01-12

Our plan

  1. Automatable reports

  2. Version control

  3. Dissemination and academic websites

  4. Containerisation for reproducible environments

  5. Encryption and advanced programming

What did we do so far?

  • Automatable reports

Session 2: Version control

  • Part I: Version control
  • Part II: GitHub and GitHub Desktop
  • Part III: Collaborating with GitHub

Readings with further information

Part I: Version control

Why GitHub? The analogy of climbing

Source: illustrations by @allison_horst

Git

  • Who has used Git previously? What for?
  • Who has used any kind of version control previously?

Git as Version Control

You all have used version control previously:

  • “Save early, save often”.
  • Easiest version control: the back-button.
  • “Track-changes” in MS Word is a rudimentary form of version control.

 

Git is a sophisticated form of version control. Git…

  • … maintains a single, updated version of each file.
  • … keeps a record of all previous versions.
  • … keeps a record of exact changes between the versions.
  • … collaborators can work simultaneously.
  • … documents who made changes, when and why.

Why should I learn yet another tool?

Why should I learn yet another tool? Git as Version Control

  • Maintain an overview
  • Access previous versions

 

  • Strengthen Collaboration
  • Foster Transparency

 

  • Next:
  • Create an academic website
  • Create an academic CV
  • → Qmd and Rmd allow us to use the same document for both

Git: a preview

Why Git? 1) Streamline your work

Source: illustrations by @allison_horst

Why Git? 2) Support others and yourself

Source: illustrations by @allison_horst

Why Git? 3) Reuse your work

Source: illustrations by @allison_horst

Why Git? 4) Contribute earlier

Source: illustrations by @allison_horst

Why Git? 5) Fail safely

Source: illustrations by @allison_horst

Why Git? Summary

  1. Streamline your work
  2. Support others and yourself
  3. Reuse your work
  4. Contribute earlier
  5. Fail safely

Git: Terminiology (1): pillars

  • repository (repo)
  • commit
  • diff

 

  • branch
  • remote
  • local
  • commit message and tag
  • gist
  • README

Git: Terminiology (1): pillars

  • repository (repo): directory of files
  • commit: snapshot of directory
  • diff: difference between two commits

 

  • branch: detour from main stream without changing main stream
  • remote: repo hosted online
  • local: repo on your hard drive (offline)
  • commit message and tag: notes assigned to commits
  • gist: small repo to share one code file
  • README: “About me” section of your repository or your GitHub profile

Git: Terminiology (2): actions

  • to commit
  • to merge
  • to fork
  • to clone
  • to push
  • to pull

Git: Terminiology (2): actions

  • to commit: create a commit
  • to merge: merge on branch into another branch
  • to fork: create a copy of someone else’s repo in your GitHub account
  • to clone: create your local copy of the repo
  • to push: upload changes from your local to your remote
  • to pull: update local from remote

To recap at home

Lab Session I 👩‍💻 👨‍💻

Simulation to learn using basic Git commands

  • Try to complete the online exercises to understand the git logic:
    • Main: Introduction Sequence: (first 4 exercises)
    • Remote: Push & Pull: (first 4 exercises)

🎁 Bonus

  • Main: Ramping Up
  • Main: Moving Work Around
  • Main: A Mixed Bag
  • Remote: To Origin And Beyond – Advanced Git Remotes!
25:00

Part II: GitHub and GitHub Desktop

Git basics

Generally, git operates through a shell. (Later on, we will install a GUI can make life easier.)

What is a shell?

A shell (or terminal) is a program on your computer whose job is to run other programs, rather than do calculations itself.

Let’s start open the shell in In RStudio: Tools > Shell.

A note for Windows users: the default Windows shell does not support git commands. However, we can solve this by installing GitBash - a light shell that does support git commands.

Git: Terminiology (3): what’s the difference?

Git: Terminiology (3): tools

Git is the command line version control system (VCS) software, which works on your local computer.

GitHub is an internet hosting service for git repositories.

GitHub Desktop is an application that enables you to interact with GitHub using a GUI instead of the command line or a web browser.

GitHub Desktop: Getting started

GitHub Desktop: track version history

Using GitHub and Rstudio integrated

## Pro Tip

Common mistake

  • You don’t remember whether you have Git installed previously.
  • Solution: You may be able to find Git using which git (in a bash shell) or where git (in a Windows shell)

My two cents 💰

  • Do not create a Git repo inside your Git repo

  • Git repos are not supposed to go into Google Drive, DropBox, or OneDrive, but…

Additional Tipps

Resetting

# For when you made a mistake in your commit (like the commit message) and need to undo it without losing any changes*
git reset HEAD^

Reverting

# For when you made a mistake and pushed it to a remote repository
git revert b68cb2dc

Stashing

# For when you need to stash away changes for later but do not want to make a commit.
# save current changes to stash (-u includes new files)
git stash -u
# apply stash changes and delete stash
git stash pop

Additional Tipps

checkout

# new branch
git checkout -b new_branch_name
# checkout existing branch
git checkout existing_branch_name
# reset file to a previous version
git checkout b68cb2dc1a53275dd779391bcac96a96c559b894 -- file_name.md

Publishing Code

Sharing code publicly can be very useful but poses some challenges. To avoid security issues or problems for you and other users consider the following tipps.

Publishing Code > Security

Start a new git repository and copy paste your code when you publish. You might have information in your git history you don’t want to share (passwords, private access tokens, “unprofessional” text or code from the early days of a project)

Publishing Code > README.md

Add a “Usage” and “Contributing” section to your README.md

  • Add a sentence or two on the WHY of the project
  • Add a section “Usage” on how to install/use your project
  • Have a simple and short code example showcasing how to use the project
  • Explain the basic project structure

Publishing Code > Add a License

  • MIT
    • pro: easy to understand and use
    • con: organisations and individuals can use your code without contributing back
  • GPLv3
    • pro: organisations and individuals have to contribute back to the project when your code is used in public projects
    • con: not as easy, some organisation do not want to use software they need to contribute back to
  • Creative Commons
    • pro: Allows you to customize non-commercial or commercial usage and whether it can be used without or with attribution
    • con: the many version lead to most people not knowing them and ignoring the license

Lab Session II 👩‍💻 👨‍💻

Get started on GitHub

  • Initialize a repo and push it to the remote.
  • Commit a change to your repo and push it.
  • Find another user and clone her repo.
20:00

Part III: Using Git and collaborating with GitHub

Collaboration in teams

How to collaborate? Two common workflows

  1. Shared repo workflow
  • For small projects where you are basically in the same physical space (e.g. lab with offices near each other).
  • Be careful! You are cloning the main repository.
  • Everyone has push and pull access to the central repo, so be careful and:
    • Never commit to the master directly.
    • Always do your work on a different branch from master.
  1. Fork and pull workflow
  • This model is used by larger teams.
  • The “owner”/“Project Leader” of the upstream repo assigns rights to “Collaborators”.
  • Collaborators do not have push access to main (upstream) repo.
  • Project Lead accepts Pull Requests (PRs) fro collaborators, reviews them, then merges them into main repo.

1) Shared repo workflow

2) Fork and pull workflow

Merging branches

Two common errors

  • Push rejected. This can happen if you have changes on the remote and on your local repo. > - Solution: Pull first. Resolve the conflict. Then try your push again.

  • fatal: not a git repository. The command cannot be executed because the current directory is not a Git directory. > - Solution: initialize the repo or change directory to the repo

Some advice before we practice

  • Commit early and often.

  • Push to your remote on GitHub often (but not as often as you commit).

  • Establish a naming convention for commits.

  • Use tags to mark key steps.

  • Fork and clone from foreign repos (instead of “just cloning”)

  • Branch of your development version, especially in teams.

Now it’s your turn: use the power of Git

Source: illustrations by @allison_horst

Lab Session III 👩‍💻 👨‍💻

Working with GitHub

20:00

See you